-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[clang] [diagnostics] Stable IDs for Clang diagnostics #168153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
SARIF diagnostics require that each rule have a stable `id` property to identify that rule across runs, even when the compiler or analysis tool has changed. We were previously setting the `id` property to the numeric value of the enum value for that diagnostic within the Clang implementation; this value changes whenever an unrelated diagnostic is inserted or removed earlier in the list. This change sets the `id` property to the _text_ of that same enum value. This value would only change if someone renames the enum value for that diagnostic, which should happen much less frequently than renumbering. For now, we will just assume that renaming happens infrequently enough that existing consumers of SARIF will not notice. In the future, we could take advantage of SARIF's support for `deprecatedIds`, which let a rule specify the IDs by which it was previously known. This would let us rename, split, or combine diagnostics while still being able to correlate the new diagnostic IDs with older SARIF logs and/or suppressions. Nothing in this change affects how warnings are configured on the command line or in `#pragma clang diagnostic`. Those still use warning groups, not the stable IDs.
|
@llvm/pr-subscribers-clang Author: Dave Bartolomeo (dbartol) ChangesSARIF diagnostics require that each rule have a stable This change sets the For now, we will just assume that renaming happens infrequently enough that existing consumers of SARIF will not notice. In the future, we could take advantage of SARIF's support for Nothing in this change affects how warnings are configured on the command line or in Full diff: https://github.com/llvm/llvm-project/pull/168153.diff 4 Files Affected:
diff --git a/clang/include/clang/Basic/DiagnosticIDs.h b/clang/include/clang/Basic/DiagnosticIDs.h
index 06446cf580389..9fc49325205a2 100644
--- a/clang/include/clang/Basic/DiagnosticIDs.h
+++ b/clang/include/clang/Basic/DiagnosticIDs.h
@@ -332,6 +332,9 @@ class DiagnosticIDs : public RefCountedBase<DiagnosticIDs> {
/// Given a diagnostic ID, return a description of the issue.
StringRef getDescription(unsigned DiagID) const;
+ /// Given a diagnostic ID, return the stable ID of the diagnostic.
+ std::string getStableID(unsigned DiagID) const;
+
/// Return true if the unmapped diagnostic levelof the specified
/// diagnostic ID is a Warning or Extension.
///
diff --git a/clang/lib/Basic/DiagnosticIDs.cpp b/clang/lib/Basic/DiagnosticIDs.cpp
index a1d9d0f34d20d..e3903a3edadfd 100644
--- a/clang/lib/Basic/DiagnosticIDs.cpp
+++ b/clang/lib/Basic/DiagnosticIDs.cpp
@@ -51,6 +51,22 @@ const StaticDiagInfoDescriptionStringTable StaticDiagInfoDescriptions = {
#undef DIAG
};
+struct StaticDiagInfoStableIDStringTable {
+#define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \
+ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY) \
+ char ENUM##_id[sizeof(#ENUM)];
+#include "clang/Basic/AllDiagnosticKinds.inc"
+#undef DIAG
+};
+
+const StaticDiagInfoStableIDStringTable StaticDiagInfoStableIDs = {
+#define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \
+ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY) \
+ #ENUM,
+#include "clang/Basic/AllDiagnosticKinds.inc"
+#undef DIAG
+};
+
extern const StaticDiagInfoRec StaticDiagInfo[];
// Stored separately from StaticDiagInfoRec to pack better. Otherwise,
@@ -63,6 +79,14 @@ const uint32_t StaticDiagInfoDescriptionOffsets[] = {
#undef DIAG
};
+const uint32_t StaticDiagInfoStableIDOffsets[] = {
+#define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \
+ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY) \
+ offsetof(StaticDiagInfoStableIDStringTable, ENUM##_id),
+#include "clang/Basic/AllDiagnosticKinds.inc"
+#undef DIAG
+};
+
enum DiagnosticClass {
CLASS_NOTE = DiagnosticIDs::CLASS_NOTE,
CLASS_REMARK = DiagnosticIDs::CLASS_REMARK,
@@ -95,6 +119,7 @@ struct StaticDiagInfoRec {
uint16_t Deferrable : 1;
uint16_t DescriptionLen;
+ uint16_t StableIDLen;
unsigned getOptionGroupIndex() const {
return OptionGroupIndex;
@@ -107,6 +132,14 @@ struct StaticDiagInfoRec {
return StringRef(&Table[StringOffset], DescriptionLen);
}
+ StringRef getStableID() const {
+ size_t MyIndex = this - &StaticDiagInfo[0];
+ uint32_t StringOffset = StaticDiagInfoStableIDOffsets[MyIndex];
+ const char *Table =
+ reinterpret_cast<const char *>(&StaticDiagInfoStableIDs);
+ return StringRef(&Table[StringOffset], StableIDLen);
+ }
+
diag::Flavor getFlavor() const {
return Class == CLASS_REMARK ? diag::Flavor::Remark
: diag::Flavor::WarningOrError;
@@ -159,7 +192,8 @@ const StaticDiagInfoRec StaticDiagInfo[] = {
SHOWINSYSMACRO, \
GROUP, \
DEFERRABLE, \
- STR_SIZE(DESC, uint16_t)},
+ STR_SIZE(DESC, uint16_t), \
+ STR_SIZE(#ENUM, uint16_t)},
#include "clang/Basic/DiagnosticCommonKinds.inc"
#include "clang/Basic/DiagnosticDriverKinds.inc"
#include "clang/Basic/DiagnosticFrontendKinds.inc"
@@ -434,6 +468,15 @@ StringRef DiagnosticIDs::getDescription(unsigned DiagID) const {
return CustomDiagInfo->getDescription(DiagID).GetDescription();
}
+/// getIDString - Given a diagnostic ID, return the stable ID of the diagnostic.
+std::string DiagnosticIDs::getStableID(unsigned DiagID) const {
+ if (const StaticDiagInfoRec *Info = GetDiagInfo(DiagID))
+ return Info->getStableID().str();
+ assert(CustomDiagInfo && "Invalid CustomDiagInfo");
+ // TODO: Stable IDs for custom diagnostics?
+ return std::to_string(DiagID);
+}
+
static DiagnosticIDs::Level toLevel(diag::Severity SV) {
switch (SV) {
case diag::Severity::Ignored:
diff --git a/clang/lib/Frontend/SARIFDiagnostic.cpp b/clang/lib/Frontend/SARIFDiagnostic.cpp
index ac27d7480de3e..ac72a7af05429 100644
--- a/clang/lib/Frontend/SARIFDiagnostic.cpp
+++ b/clang/lib/Frontend/SARIFDiagnostic.cpp
@@ -46,7 +46,9 @@ void SARIFDiagnostic::emitDiagnosticMessage(
if (!Diag)
return;
- SarifRule Rule = SarifRule::create().setRuleId(std::to_string(Diag->getID()));
+ std::string StableID =
+ Diag->getDiags()->getDiagnosticIDs()->getStableID(Diag->getID());
+ SarifRule Rule = SarifRule::create().setRuleId(StableID);
Rule = addDiagnosticLevelToRule(Rule, Level);
diff --git a/clang/test/Frontend/sarif-diagnostics.cpp b/clang/test/Frontend/sarif-diagnostics.cpp
index 767c5802ca13d..3f7adb80c67fd 100644
--- a/clang/test/Frontend/sarif-diagnostics.cpp
+++ b/clang/test/Frontend/sarif-diagnostics.cpp
@@ -34,35 +34,35 @@ void f1(t1 x, t1 y) {
// Omit filepath to llvm project directory
// CHECK: test/Frontend/sarif-diagnostics.cpp"},"mimeType":"text/plain","roles":["resultFile"]}],"columnKind":"unicodeCodePoints","results":
// CHECK: [{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
-// CHECK: {"endColumn":1,"startColumn":1,"startLine":12}}}],"message":{"text":"'main' must return 'int'"},"ruleId":"{{[0-9]+}}","ruleIndex":0},
+// CHECK: {"endColumn":1,"startColumn":1,"startLine":12}}}],"message":{"text":"'main' must return 'int'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":0},
// CHECK: {"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":11,"startColumn":11,"startLine":13}}}],"message":{"text":"use of undeclared identifier
-// CHECK: 'hello'"},"ruleId":"{{[0-9]+}}","ruleIndex":1},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":
+// CHECK: 'hello'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":1},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":
// CHECK: {"index":0,"uri":"file://{{.+}}"},"region":{"endColumn":17,"startColumn":17,"startLine":15}}}],"message":{"text":"invalid digit 'a' in decimal
-// CHECK: constant"},"ruleId":"{{[0-9]+}}","ruleIndex":2},{"level":"warning","locations":[{"physicalLocation":{"artifactLocation":
+// CHECK: constant"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":2},{"level":"warning","locations":[{"physicalLocation":{"artifactLocation":
// CHECK: {"index":0,"uri":"file://{{.+}}"},"region":{"endColumn":5,"startColumn":5,"startLine":19}}}],"message":{"text":"misleading indentation; statement is not part
-// CHECK: of the previous 'if'"},"ruleId":"{{[0-9]+}}","ruleIndex":3},{"level":"note","locations":[{"physicalLocation":{"artifactLocation":
+// CHECK: of the previous 'if'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":3},{"level":"note","locations":[{"physicalLocation":{"artifactLocation":
// CHECK: {"index":0,"uri":"file://{{.+}}"},"region":{"endColumn":3,"startColumn":3,"startLine":17}}}],"message":{"text":"previous statement is
-// CHECK: here"},"ruleId":"{{[0-9]+}}","ruleIndex":4},{"level":"warning","locations":[{"physicalLocation":{"artifactLocation":
+// CHECK: here"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":4},{"level":"warning","locations":[{"physicalLocation":{"artifactLocation":
// CHECK: {"index":0,"uri":"file://{{.+}}"},"region":{"endColumn":10,"startColumn":10,"startLine":18}}}],"message":{"text":"unused variable
-// CHECK: 'Yes'"},"ruleId":"{{[0-9]+}}","ruleIndex":5},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
+// CHECK: 'Yes'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":5},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":12,"startColumn":12,"startLine":21}}}],"message":{"text":"use of undeclared identifier
-// CHECK: 'hi'"},"ruleId":"{{[0-9]+}}","ruleIndex":6},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
+// CHECK: 'hi'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":6},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":1,"startColumn":1,"startLine":23}}}],"message":{"text":"extraneous closing brace
-// CHECK: ('}')"},"ruleId":"{{[0-9]+}}","ruleIndex":7},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
+// CHECK: ('}')"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":7},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":6,"endLine":27,"startColumn":5,"startLine":27}}},{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":10,"endLine":27,"startColumn":9,"startLine":27}}},{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":7,"startColumn":7,"startLine":27}}}],"message":{"text":"invalid operands to binary expression ('t1' and
-// CHECK: 't1')"},"ruleId":"{{[0-9]+}}","ruleIndex":8}],"tool":{"driver":{"fullName":"","informationUri":"https://clang.llvm.org/docs/
+// CHECK: 't1')"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":8}],"tool":{"driver":{"fullName":"","informationUri":"https://clang.llvm.org/docs/
// CHECK: UsersManual.html","language":"en-US","name":"clang","rules":[{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"warning","rank":-1},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"note","rank":-1},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"warning","rank":-1},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"warning","rank":-1},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"note","rank":-1},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"warning","rank":-1},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":
-// CHECK: {"text":""},"id":"{{[0-9]+}}","name":""}],"version":"{{[0-9]+\.[0-9]+\.[0-9]+[^" ]*}}"}}}],"version":"2.1.0"}
+// CHECK: {"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""}],"version":"{{[0-9]+\.[0-9]+\.[0-9]+[^" ]*}}"}}}],"version":"2.1.0"}
// CHECK: 2 warnings and 6 errors generated.
|
Sirraide
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of this seems fine to me, but the RFC has to be accepted first before this can be merged.
I don’t think there is a good way to deal w/ custom diag ids. Those shouldn’t be used anywhere in Clang anyway (I know there are a few places that use them, but those should all be updated to use proper diagnostics instead).
|
The RFC has been approved to proceed by the Clang area team. I've updated the PR description to note a few open questions from the RFC, and how this PR chooses to answer each of those questions. |
|
From the Area Team meeting notes, regarding diagnostic IDs:
Should we automatically strip the prefix when generating the stable ID (either in the SARIF emitter, or in tablegen)? |
+1, those custom diagnostics should be using real diagnostics instead.
I think we may want an actual mapping instead; the problem is, we want the ability to rename any part of the diagnostic identifier because it is most readable when it matches the text of the diagnostic to some extent. The prefix happens to be something we change somewhat often as an example. But we sometimes reword diagnostics either because the diagnostic is now being used for more circumstances or because someone just found much better phrasing and it would be bad if our "stable" IDs either 1) weren't actually stable, or 2) were stable but prevented us from improving maintenance costs. In the PR summary you discuss this a bit with an idea of how we could improve it in the future, but I think that's going to be a problem sooner rather than later and we could consider addressing it up front. Because these IDs are not being exposed to the user (either documented on a webpage, displayed as part of the diagnostic text to the user, etc), should we auto-generate a UUID for the diagnostic entries which don't have one already associated with them yet (which is all of them at first), and use that as the stable ID? Then it doesn't matter how the diagnostic changes, the underlying UUID is still the same. I'm not certain of how we would persist this across builds, though, so it might be a terrible idea. But I'm hoping we can find something that 1) doesn't require the developer to manually pick an ID and put it in the .td file, 2) allows us to modify anything about the diagnostic we want without impacting the stable ID. WDYT? |
To persist across builds, the IDs would either have to be in the The SARIF stable ID doesn't have to be meaningful. If we went with UUIDs, someone adding a new diagnostic could just generate one easily enough. We'd just need a check somewhere to prevent duplicates in case of copy-paste mistakes. We could also just use explicitly-assigned, dense-ish stable integer IDs, MSVC-style. Those are easier to use in both written and oral conversation than UUIDs, but they might require slightly more process. We'd do an initial pass to add them to the I suppose we could split the difference and use one of those code phrase generators to make diagnostic IDs. Easier to use in conversation than UUIDs, but less likely to collide than manually-assigned dense integers. The downside is that at some point, we'll find ourselves writing something like "We're getting too many false positive reports for purple-monkey-dishwasher, so we'll need to tune the heuristcs." Does any one of the above approaches seem workable? |
Yeah, and we don't have a way to persist information across builds that I'm aware of.
Yeah, that's why I didn't think we'd want to use dense integers. With UUIDs, downstreams have far less risk of conflicts.
Yeah, that does split the middle, but seems like a nightmare for people adding new diagnostics because they have to figure out a code phrase for each one, and I'm a bit worried about new contributor experience from that too.
Maybe, but what about this as an idea: we have a script that generates a UUID and gives it a What do you think of something along those lines? It's a bit heavier of a process to add a diagnostic than it is today, but we could actually make this a nicer script that does more than just SARIF IDs. e.g., maybe this script generates the full diagnostic for you. Like: (the tool could even automatically add the diagnostic text to the correct .td files directly instead of making people copy and paste it, or have a Maybe I'm scope-creeping too. :-D |
Part of the implementation of [RFC] Emitting Auditable SARIF Logs from Clang
SARIF diagnostics require that each rule have a stable
idproperty to identify that rule across runs, even when the compiler or analysis tool has changed. We were previously setting theidproperty to the numeric value of the enum value for that diagnostic within the Clang implementation; this value changes whenever an unrelated diagnostic is inserted or removed earlier in the list.This change sets the
idproperty to the text of that same enum value. This value would only change if someone renames the enum value for that diagnostic, which should happen much less frequently than renumbering.For now, we will just assume that renaming happens infrequently enough that existing consumers of SARIF will not notice. In the future, we could take advantage of SARIF's support for
deprecatedIds, which let a rule specify the IDs by which it was previously known. This would let us rename, split, or combine diagnostics while still being able to correlate the new diagnostic IDs with older SARIF logs and/or suppressions.Nothing in this change affects how warnings are configured on the command line or in
#pragma clang diagnostic. Those still use warning groups, not the stable IDs.Potential discussion topics
From @AaronBallman on the RFC:
As a starting point, this PR proposes the following answers to those open questions:
diagtoolin the future if users find it relevant.